Rank | Count | Beginning |
---|---|---|
122394 | 7505 | Na |
259316 | 7353 | V |
8 | 5950 | A |
243047 | 4319 | To |
71707 | 3982 | Je |
170259 | 3562 | Podle |
168756 | 3186 | Po |
3654 | 3097 | Ale |
91380 | 2373 | Když |
261699 | 2322 | Ve |
65691 | 2256 | Jak |
176152 | 2008 | Pokud |
41771 | 1967 | Do |
284378 | 1858 | Za |
32145 | 1590 | Co |
196328 | 1560 | Pro |
64827 | 1552 | Já |
238301 | 1310 | Ten |
230504 | 1302 | Tak |
284366 | 1263 | Z |
151892 | 1227 | O |
62278 | 1210 | I |
257919 | 1189 | Už |
154727 | 1164 | Od |
73928 | 1102 | Jeho |
192588 | 1031 | Při |
35468 | 980 | Další |
202582 | 977 | První |
44274 | 948 | Domácí |
211590 | 934 | S |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV